计量经济学和机器学习中的各种问题,包括仪器变量回归和钟声残留最小化,可以表达为满足一组条件矩限制(CMR)。我们得出了满足CMR的一般游戏理论策略,该策略可扩展到非线性问题,可与基于梯度的优化相提并论,并且能够考虑有限的样本不确定性。我们恢复了Dikkala等人的方法。和Dai等。作为我们一般框架的特殊情况,请先详细介绍各种扩展,以及如何有效地解决CMR定义的游戏。
translated by 谷歌翻译
我们考虑模仿学习问题,在这些问题中,专家可以在演示时间和测试时间内访问学习者隐藏的每个集合上下文。尽管学习者可能无法通过考虑整个国家和行动的历史来早期在情节中准确地重现专家行为,但他们可能最终能够识别上下文并像专家一样行事。我们证明,与非政策的方法相比,在政策模仿学习算法(有或不访问可查询的专家)都可以更好地处理这些渐近性问题,并且能够避免闩锁行为(对过去的动作的天真重复)这困扰着后者。我们在玩具匪徒域中进行实验,该实验表明,与统一的policy方法的均匀性能相比,非政策方法是否能够渐近地匹配专家的性能。我们证明,在几个连续的控制任务上,政策方法能够使用历史记录来识别上下文,而在访问历史记录时,违反政策方法实际上表现较差。
translated by 谷歌翻译
在线模仿学习是如何最好地访问环境或准确的模拟器的问题的问题。先前的工作表明,在无限的样本制度中,匹配的确切力矩达到了与专家政策的价值等效性。但是,在有限的样本制度中,即使没有优化错误,经验差异也会导致性能差距,该差距以$ h^2 / n $的行为克隆缩放,在线时刻$ h / \ sqrt {n} $匹配,其中$ h $是地平线,$ n $是专家数据集的大小。我们介绍了重播估算的技术以减少这种经验差异:通过反复在随机模拟器中执行缓存的专家动作,我们计算了一个更平滑的专家访问分布估算以匹配的。在存在一般函数近似的情况下,我们证明了一个元定理,可以减少离线分类参数估计误差的方法差距(即学习专家策略)。在表格设置或使用线性函数近似中,我们的元定理表明,我们方法产生的性能差距达到了最佳$ \ widetilde {o} \ left(\ min(\ min({h^h^{3/2}}}} / {n} ,{h} / {\ sqrt {n}} \ right)$依赖关系,在与先前的工作相比明显弱的假设下。我们在多个连续的控制任务上实施了多个方法的多次实例化,并发现我们能够显着提高策略绩效跨各种数据集尺寸。
translated by 谷歌翻译
迭代学习控制(ILC)是在存在建模误差中的高性能跟踪的强大技术,以获得最佳控制应用。在化学反应器,工业机器人和Quadpopters等应用中,有广泛的现有工作表明其经验效果。然而,即使在存在大型建模错误的情况下,也有很少的现有理论工作,即使在大型建模错误的情况下,也可以在存在大型建模错误中,其中使用错过模型(mm)的最佳控制方法经常表现不佳。我们的工作提出了ilc和mm对线性二次调节器(LQR)问题的表现的理论研究,具有未知的过渡动态。我们表明,对于ILC的最佳LQR控制器测量的次优差间隙低于MM的高阶术语在高建模误差的方案中变得显着的比例低于MM。我们分析的一个关键部分是有限地域设置中离散Ricatti方程的扰动界限,其中解决方案不是一个固定点,并且需要使用递归界限跟踪错误。我们将我们的理论调查结果与具有近似模型的玩具线性动力系统的实验实验,一个非线性倒立摆动系统,具有错过质量的非线性倒立摆动系统,以及风的非线性平面正质量。实验表明,根据计算轨迹的成本,ILC在模拟误差高时显着优于MM显着。
translated by 谷歌翻译
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009;Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Exploring the climate impacts of various anthropogenic emissions scenarios is key to making informed decisions for climate change mitigation and adaptation. State-of-the-art Earth system models can provide detailed insight into these impacts, but have a large associated computational cost on a per-scenario basis. This large computational burden has driven recent interest in developing cheap machine learning models for the task of climate model emulation. In this manuscript, we explore the efficacy of randomly wired neural networks for this task. We describe how they can be constructed and compare them to their standard feedforward counterparts using the ClimateBench dataset. Specifically, we replace the serially connected dense layers in multilayer perceptrons, convolutional neural networks, and convolutional long short-term memory networks with randomly wired dense layers and assess the impact on model performance for models with 1 million and 10 million parameters. We find average performance improvements of 4.2% across model complexities and prediction tasks, with substantial performance improvements of up to 16.4% in some cases. Furthermore, we find no significant difference in prediction speed between networks with standard feedforward dense layers and those with randomly wired layers. These findings indicate that randomly wired neural networks may be suitable direct replacements for traditional dense layers in many standard models.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.
translated by 谷歌翻译
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.
translated by 谷歌翻译